Chinese String Searching Using The KMP Algorithm

نویسنده

  • Robert Wing Pong Luk
چکیده

This paper is about the modification of KMP (Knuth, Morris and Pratt) algorithm for string searching of Chinese text. The difficulty is searching through a text string of singleand multi-byte characters. We showed that proper decoding of the input as sequences of characters instead of bytes is necessary. The standard KMP algorithm can easily be modified for Chinese string searching but at the worst-case time-complexity of O(3n) in terms of the number of comparisons. The finite-automaton implementation can achieve worst-case time complexity of O(2n) but constructing the transition table depends on the size of the alphabet, Z, which is large for Chinese (for Big-5, Z > 13,000). A mapping technique reduces the size the alphabet to at most IPI where P is the pattern string.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient validation and construction of Knuth–Morris–Pratt arrays

Knuth-Morris-Pratt (KMP) arrays are known as the ”failure function” of the Knuth-Morris-Pratt string matching algorithm. We present an algorithm to check if an integer array is a KMP array. This gives a method for computing all the distinct KMP arrays.

متن کامل

Chinese Spelling Check based on N-gram and String Matching Algorithm

This paper presents a Chinese spelling check approach based on language models combined with string match algorithm to treat the problems resulted from the influence caused by Cantonese mother tone. N-grams first used to detecting the probability of sentence constructed by the writers, a string matching algorithm called KnuthMorris-Pratt (KMP) Algorithm is used to detect and correct the error. ...

متن کامل

Multithreaded Implementation of Hybrid String Matching Algorithm

Reading and taking reference from many books and articles, and then analyzing the Navies algorithm, Boyer Moore algorithm and Knuth Morris Pratt (KMP) algorithm and a variety of improved algorithms, summarizes various advantages and disadvantages of the pattern matching algorithms. And on this basis, a new algorithm – Multithreaded Hybrid algorithm is introduced. The algorithm refers to Boyer M...

متن کامل

Hardware based String Matching Algorithms: A Survey

There are various string matching Algorithms which are software based but some are hardware based. The main factor of string matching algorithm is depending on searching efficiency. In this paper we have discussed about the hardware based string matching algorithms such as Brute Force, KMP, and Aho-Corasicks with their applications. There are different types of string matching algorithms which ...

متن کامل

Practical Algorithmic Techniques for Several String Processing Problems

The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or searching for specific patterns within texts, occur frequently in this context. In this paper we present several novel and practical algorithmic techniques for proc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996